Exploring Large Document Collections with a Dynamic Hierarchy
نویسندگان
چکیده
We present a method for dealing with the large document collections that takes into account the user needs in a better way. The user can choose the desired granularity of the hierarchy and then apply this hierarchy to document collection for classifying the documents. The granularity depends directly on the number of clusters. The clusters are presented to the user in two different ways: (1) as a representative document of the cluster; (2) as a set of keywords characterizing all documents of the
منابع مشابه
Exploring Large Digital Library Collections Using a Map-Based Visualisation
In this paper we describe a novel approach for exploring large document collections using a map-based visualisation. We use hierarchically structured semantic concepts that are attached to the documents to create a visualisation of the semantic space that resembles a Google Map. The approach is novel in that we exploit the hierarchical structure to enable the approach to scale to large document...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملExploration of Text Collections with Hierarchical Feature
Document classiication is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classiication techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classiicat...
متن کاملLearning Topic Hierarchies and Thematic Annotations from Document Collections
Large textual and multimedia databases are now widely available but their exploitation is restricted by the lack of metainformation about their structure and semantics. Many such collections like those gathered by most search engines are loosely structured. Some have been manually structured, at the expense of an important effort. This is the case of hierarchies like those of internet portals (...
متن کاملSummarization of Changes in Dynamic Text Collections
Information Retrieval is the Informatics field primarily focused on all problems and challenges related to information storage and access. The large majority of works in this area are based on static collections of documents. However, many of these collections are dynamic, and have evolved over time with documents being added, edited or simply removed at different times. Even in highly dynamic ...
متن کامل